Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization
نویسندگان
چکیده
Relevance judgments are often the most expensive part of information retrieval evaluation, and techniques for comparing retrieval systems using fewer relevance judgments have received significant attention in recent years. This paper proposes a novel system comparison method using an expectationmaximization algorithm. In the expectation step, real-valued pseudo-judgments are estimated from a set of system results. In the maximization step, new system weights are learned from a combination of a limited number of actual human judgments and system pseudo-judgments for the other documents. The method can work without any human judgments, and is able to improve its accuracy by incrementally adding human judgments. Experiments using TREC Ad Hoc collections demonstrate strong correlations with system rankings using pooled human judgments, and comparison with existing baselines indicates that the new method achieves the same comparison reliability with fewer human judgments.
منابع مشابه
Northeastern University Runs at the TREC13 Crowdsourcing Track
The goal of the TREC 2012 Crowdsourcing Track was to evaluate approaches to crowdsourcing high quality relevance judgments for images and text documents. This paper describes our submission to the Text Relevance Assessing Task. We explored three different approaches for obtaining relevance judgments. Our first two approaches are based on collecting a limited number of preference judgments from ...
متن کاملOn Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents
We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used ma...
متن کاملExtended Expectation Maximization for Inferring Score Distributions
Inferring the distributions of relevant and nonrelevant documents over a ranked list of scored documents returned by a retrieval system has a broad range of applications including information filtering, recall-oriented retrieval, metasearch, and distributed IR. Typically, the distribution of documents over scores is modeled by a mixture of two distributions, one for the relevant and one for the...
متن کاملA Ground Truth Inference Model for Ordinal Crowd-Sourced Labels Using Hard Assignment Expectation Maximization
In this paper we propose an iterative approach for inferring a ground truth value of an item from judgments collected form online workers. The method is specifically designed for cases in which the collected labels are ordinal. Our algorithm works by iteratively solving a hard-assignment EM model and later calculating one final expected value after the convergence of the EM procedure.
متن کاملInferred AP : Estimating Average Precision with Incomplete Judgments
In this work, we consider the evaluation of retrieval systems using incomplete relevance information. When the document collection is dynamic, as in the case of web retrieval, new documents are added to the collection over time. Hence, the relevance judgments become incomplete, and the judged relevant documents become a smaller random subset of the entire relevant document set. Also, in the cas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014